This document provides a summary of various data sources used in our analysis, along with references to the R code that handles data reading and cleaning.
We are examining daily climate data from temperature stations across Canada, focusing on five key stations that are most representative of the entire province of British Columbia. The dataset includes the following columns:
x: Longitude
y: Latitude
LOCAL_DATE: Date in the format year-month-day TOTAL_PRECIPITATION: Total precipitation in mm
STATION_NAME: Name of the station
MEAN_TEMPERATURE: Mean of daily maximum and minimum temperatures
MAX_TEMPERATURE: Daily maximum temperature
MIN_TEMPERATURE: Daily minimum temperature
TOTAL_RAIN: Total rainfall
MIN_REL_HUMIDITY: Minimum relative humidity
LOCAL_YEAR: Year
LOCAL_MONTH: Month
Our goal is to identify stations with data extending back to 1941 in order to compare the heatwave of 1941 with that of 2021.
Link to data source: Daily Climate Station Data
Through the provided link, you can utilize the map tool to locate your desired station by either scrolling through the map or entering the station name or ID in the search bar.
Please ensure you download the data in CSV format, rather than GeoJSON.
(Specific station information, including location and ID, can be found in section 1.3.)
If the link does not work, the data can be accessed through the official website of the Government of Canada by following the navigation path outlined below: Home > Environment and natural resources > Climate change > Climate change: our plan > Adapting to Climate Change > Canadian Centre for Climate Services > Display and Download Climate Data > Climate data extraction tool > Daily climate data (the link above)
year range: 1937-2024
| Station Name | Start Year | End Year | Station ID | Start Date | End Date |
|---|---|---|---|---|---|
| VANCOUVER INTL A | 1937 | 2024 | 1108447 | 1937-01-01 | 2024-08-01 |
| VANCOUVER INTL A | 1937 | 2024 | 1108395 | 2013-06-13 | 2024-08-01 |
year range: 1935-2024
table:
| Station Name | Start Year | End Year | Station ID | Start Date | End Date |
|---|---|---|---|---|---|
| ABBOTSFORD A | 2012 | 2024 | 1100032 | 2012-06-21 | 2024-07-30 |
| ABBOTSFORD A | 1944 | 2024 | 1100030 | 1944-10-01 | 2012-06-20 |
| ABBOTSFORD UPPER SUMAS | 1935 | 1946 | 1100040 | 1935-11-01 | 1946-03-31 |
year range: 1940-2024
table:
| Station Name | Start Year | End Year | Station ID | Start Date | End Date |
|---|---|---|---|---|---|
| PRINCE GEORGE | 1942 | 2009 | 1096450 | 1942-07-01 | 2009-10-21 |
| PRINCE GEORGE | 2009 | 2024 | 1096439 | 2009-10-22 | 2024-07-30 |
| PRINCE GEORGE | 1912 | 1945 | 1096436 | 1912-08-01 | 1945-06-30 |
#### FortNelson
year range: 1937-2024
table:
| Station Name | Start Year | End Year | Station ID | Start Date | End Date |
|---|---|---|---|---|---|
| FORT NELSON A | 1937 | 2012 | 1192940 | 1937-09-01 | 2012-11-14 |
| FORT NELSON A | 2012 | 2024 | 1192946 | 2012-11-08 | 2024-07-30 |
year range: 1899-2024
table:
| Station Name | Start Year | End Year | Station ID | Start Date | End Date |
|---|---|---|---|---|---|
| KELOWNA | 1899 | 1962 | 1123930 | 1899-03-01 | 1962-09-30 |
| KELOWNA | 1961 | 1969 | 1123975 | 1961-08-01 | 1969-11-30 |
| KELOWNA A | 1968 | 2005 | 1123970 | 1968-10-01 | 2005-09-30 |
| KELOWNA | 2005 | 2024 | 1123939 | 2005-09-03 | 2024-07-30 |
| KELOWNA UBCO | 2013 | 2024 | 1123996 | 2013-12-16 | 2024-07-30 |
It is important to note that total precipitation data from 1123939 station (years 2009 to 2024) may be incomplete.
To ensure data integrity, particularly for calculating the maximum consecutive dry day, missing precipitation values have been supplemented with data from a nearby secondary UBC-O station.
Since treating missing data as zero would significantly distort the calculation of consecutive dry days.
The original Kelowna dataset contained 191 missing values, while the UBC-O dataset provided 343 complete data points. After supplementation, the Kelowna dataset now contains 132 missing values, indicating that 59 missing values were filled. within 2013-2024 period.
n the early stages of our analysis, we explored data from the following stations:
Kamloops: 1939-2024
Penticton: 1941-2024
For analyzing the relationship between temperature and field crop yield data, we picked the station from the Peach River region:
The data is read using the process_and_save_data()
function, located in
../climate_extreme_RA/R/read_data_(#accrodingly station name).R
Within the process_and_save_data() function,
deal_with_non_exist_date() detects missing dates (i.e.,
dates that should be present based on the expected continuous time
series but are absent). The function addresses these missing rows by
adding rows for these dates and filling in NA for the other
value columns.
Note: When there is an overlap in the date
ranges of different station data, I manually(before running
process_and_save_data() to read the data ) remove the rows
with overlapping dates from the older station data and retain the
corresponding rows from the newer station data.
For example, given two date ranges:
Station A: October 1, 1944, to June 20, 2012
Station B: November 1, 1935, to March 31, 1946
I will preserve the data from Station B for the period from November 1, 1935, to September 30, 1944, and the data from Station A for the period from October 1, 1944, to June 20, 2012.This adjusted data then becomes the original raw data.
The raw data path is specified at the beginning of each station’s
analysis R Markdown file, located in
../climate_extreme_RA/reports_station/(# According Station Name)_heatwave_analysis.Rmd
Link to data source: ERA5 Data
If the link does not work, the data can be accessed through the official website of the Copernicus Climate Change Service by following the navigation path outlined below:
Home > Data Set(in the search bar, search for “ERA5-Land monthly averaged data from 1950 to present”) > ERA5 Data
Step2:
The “Overview” option provides a detailed explanation of each variable.
To download data set, click on the “Download data” option
Step3: select options and download data
Then select product type and variable as below:
Next, select the desired year and month, and ensure that the “time” is set to 00:00.
For the area, YVR approximately spans from North 49.5 to South 48.4, and from West -124.0 to East -122.0. (We’ll cover how to accurately define the region using a visualization code package in section 2.2.4.) The image below offers a helpful reference of latitude and longitude range.
Then select the NetCDF format, then click the “Submit Form”
Step 4: check the download process
click “Your requests” tab at the very top.
Please note that the “My Request” section will not appear between the “Applications” and “Toolbox” sections, as you are not registered or logged in to the website yet
Once you create an account, you will be able to view your download requests on the website. Additionally, any API calls you execute through scripts will be reflected in this section.
If you click on the triangle next to the product name, you can view more details about this specific download request.
Once the request is completed, the data will be available for download via the green “Download” button.
The file will be in .nc (NetCDF) format.
Link to data source: ERA5 Data Hourly
If the link does not work, the data can be accessed through the official website of the Copernicus Climate Change Service by following the navigation path outlined below:
Home > Data Set(in the search bar, search for “ERA5-Land hourly data from 1950 to present”) > ERA5Data
The “Overview” option provides a detailed explanation of each variable.
To download data set, click on the “Download data” option
then select variable as below:
Next, select the desired year, month, day, and hour.
Keep in mind that ERA5 uses the UTC time zone for hours.
For more detailed information, refer to the “Quality assessment”
section besides the “Download data”, and click on the user guidance: link
Please note that you can only select one year at a time, which is why in section 2.3, we use a script for batch processing to retrieve data from 1950 to the present. Otherwise, you would need to manually click through many selections.
For the area, YVR approximately spans from North 49.5 to South 48.4, and from West -124.0 to East -122.0. (We’ll cover how to accurately define the region using a visualization code package in section 2.2.4.) The image below offers a helpful reference of latitude and longitude range.
Then select the NetCDF format, then click the Submit Form
click “Your requests” tab at the very top.
Please note that the “My Request” section will not appear between the “Applications” and “Toolbox” sections, as you are not registered or logged in to the website yet
Once you create an account, you will be able to view your download requests on the website. Additionally, any API calls you execute through scripts will be reflected in this section.
If you click on the triangle next to the product name, you can view more details about this specific download request.
Once the request is completed, the data will be available for download via the green “Download” button.
The file will be in .nc (NetCDF) format.
To obtain daily data, we must download the hourly data and manually aggregate into daily data.
To do this efficiently across multiple years, we use the
era5cli, a command line interface to download ERA5 data
from the website (in Section 2.2.1 or 2.2.2), which allows us to
download batches of hourly data.
The era5cli library is recommended in the Malinina and Gillett (2024), specifically in the
Data Availability section, where it states:
The steps for using this script are outlined below:
The era5cli library is introduced in the following
CLI
usage.
This library enables users to send download requests to the two websites (monthly and hourly data) via code. The generated API call simulates a request made directly through the website. For more details, please check Dataset overview.
Therefore, once logged in to the website (in Section 2.2.1 or 2.2.2), you can view and manage these download requests in the “Your Request” section.
When requesting hourly data for multiple years, the script automatically generates multiple requests simultaneously in the website.
Ensure that the script and all necessary dependencies are installed.
This document has a clear step by step guidance of how to install the related package.
Review the script to understand its functionality and how it
interacts with the era5cli library.
You could see the detailed explanation from how to Formulating requests.
For each of the request, we can choose different type of variables / parameters, the overall explanation is under Argument overview (argument to pick area, time etc.) and Variable overview (to select type of climate variables).
And in our cases, we will choose variables from the ERA5-Land, which includes 2m_temperature,total_precipitation and 2m_dewpoint_temperature (dewpoint related to humidity)
Execute the script to begin downloading the hourly data.
In our case, the script is
era5cli hourly
--variables 2m_temperature
--land
--startyear 1940
--endyear 2024
--months 4 5 6 7 8 9 10
--hours 0 1 18 19 20 21 22 23
--area 49.5 -124.0 48.4 -122.0
--merge
when we put the script in any terminal, it could be like this:
era5cli hourly --variables 2m_temperature --land --startyear 1940 --endyear 2024 --months 4 5 6 7 8 9 10 --hours 0 1 18 19 20 21 22 23 --area 49.5 -124.0 48.4 -122.0 --mergeWe want to study the summer temperature, the hours is from 11am to 6pm in BC, Canada, thus should be 18pm - 1am in UTC time zone.
--merge means we merge all year in one file (Merge
yearly output files. Default is split output files into separate files
for every year)
--land means we use the ERA5-land data.
recommandation: sepeare the year to be for instance one or two decade window to aviod massive download time and any sudden interrupt.
After the script has run, check the “Your Request” section on the website to verify that the download requests have been successfully generated and are in progress.
After downloading hourly climate data for each year using script
in Section 2.3,I gathered all NetCDF data into a single ZIP file located
at ../climate_extreme_RA/R/era5/hourly_yvr.zip. This ZIP
file contains the hourly data for all years.
Next, the NetCDF files are read using a Python script found in
the jupyter notebook
../climate_extreme_RA/R/era5/ERA5 hourly data.ipynb. The
notebook is divided into three sections: data download,
exploratory data analysis (EDA), and
a sample check for location data.
In the Download Section
the clean_and_wrangle_df() function is utilized to
clean the data and adjust the time zone from UTC to Vancouver time. This
is done using the following command:
df['time'] = df['time'].dt.tz_localize('UTC').dt.tz_convert('America/Vancouver')
Finally, the cleaned data is then saved as a CSV file in the
directory ../climate_extreme_RA/data/era5_YVR.csv
In the Sample Check for Location Section,
Plotly Express is used to visualize the temperature data
for geographical locations within Vancouver. The following code snippet
generates an interactive map:
import plotly.express as px
# Specify a basic Mapbox style
mapbox_style = "carto-positron"
# Create a scatter plot on a map with colors based on temperature values
fig = px.scatter_mapbox(df, lat='latitude', lon='longitude', hover_name='t2m', zoom=10,color='t2m', # Use temperature values to determine colors
mapbox_style=mapbox_style)
# Add a title to the plot
fig.update_layout(title='Van Temperature Distribution by Location')
fig.show()
This plot allows users to zoom in and out, enabling a closer examination of temperature distributions across different grid points, corresponding to specific latitude and longitude ranges.
And the data can be
filtered to isolate the exact grid points of interest
After converting the NetCDF data to CSV format using the Python notebook, the data is then read into R for further analysis.
The R script for this process is located at
../climate_extreme_RA/R/era5/YVR_read_wrangling_data.R.
The data focus on a specific grid point that closely matches the temperature station at YVR in Vancouver, BC. The following R code illustrates this process:
#Define file paths
file_paths <- c("../data/era5_YVR.csv"
)
# Function to read and select necessary columns
read_and_select <- function(file_path) {
read.csv(file_path)
#%>%
# select(all_of(needed_columns))
}
# Read and combine all datasets
era5_yvr <- map_dfr(file_paths, read_and_select)
# Display the first few rows of the combined data
head(era5_yvr)
ordered_era5_yvr <- era5_yvr %>%
arrange(date)
ordered_era5_yvr, is
structured by selecting the grid point that most accurately represents
the temperature station YVR, ensuring the data is well-organized for
subsequent analysis.Link to data source: Statistics Canada
If the link does not work, the data can be accessed through the official website of the Statistics Canada by following the navigation path outlined below:
Home > Data (in the search bar, search for “Area, production and farm value of potatoes” or Dataset ID: 32-10-0358-01) > Area, production and farm value of potatoes
Step1: click the Add/Remove button
Step2: the pic below shows the layout of column
filter option
Step2.1 Select the Geography:
Click on the Geography tab.
Don’t Check the box next to “Canada” as it will include nationwide data for the entire country.
Expand the list by clicking the “+” symbol next to “Canada.” and select the desired provinces: British Columbia (BC).
Step2.2 Choose the Variables:
Navigate to the Area, production and farm value of potatoes tab.
From the list of available variables, select only
Average yield, potatoes
Step2.3 Click on the Reference period tab:
Step2.4 Customize the Layout:
Adjust the layout settings as preferred.
Note that these settings will only affect the display of the dataset on the website and will not influence the structure of the data when it is downloaded.
Step 3: Download the dataset
../climate_extreme_RA/R_agricultural/read_data.R# File paths for Potato data
file_paths <- c("../data/agri/Potato_Data.csv")
# Define the columns needed,UOM is unit, VALUE is yield
needed_columns <- c("REF_DATE", "VALUE","UOM")
# Function to read and select necessary columns, and add crop type
read_and_select_pot <- function(file_path) {
read_csv(file_path) %>% select(all_of(needed_columns)) }
# Read and combine all datasets
data_pot <- map_dfr(file_paths, read_and_select_pot)
#create new column called crop type
data_pot$Crop_Type <- "Potato"
../climate_extreme_RA/R_agricultural/model_potato.R
uses a Moving Average to Detrend Time Series yield Data(transfer from
Blue line to red line)
Then, the functions within the file are employed as follows:
lm_monthly_potato and lm_season_potatoare used
to fit potato yield which is annually yield across British Columbia, VS
EHF (utilizing daily maximum EHF from Kelowna
as independent variables, aggregated into monthly or seasonal maximums
for each year) through linear regression.
Additionally,lm_onemonth_potatois used to fit 12
single-month EHF versus yield linear regressions.
The adjusted R-squared values are low for all linear models, and the correlation, as observed through scatter plots,varies and is difficult to discern.
Upon examining the daily EHF curves from the 40 highest and 40 lowest yield years, no clear pattern distinguishes the curves between high and low yield years.
This detailed analysis can be found in the report located at
../climate_extreme_RA/reports_station/Kelowna_heatwave_analysis.Rmd
and
../climate_extreme_RA/report_agricultural/agri_lm_model.Rmd
in Part 2: patato data
Dataset ID: “Table: 32-10-0364-01 (formerly CANSIM 001-0009)”
Year Range: 1926-2024
Frequency: Annual
Variables:
Marketed production from 1926-2024, unit in ton
Total Cultivated area from 2002- 2024, unit hectares
Geography: Canada, Province-wide: British Columbia (BC); Other provinces
To investigate the potential correlation between temperature extremes, including heat waves, and long-term yield patterns in British Columbia, a comprehensive yield dataset is required.
Therefore, it is necessary to estimate the total cultivated area for years prior to 2002 to cover a longer period, as marketed production data is available from 1926 to 2024.
Link to data source: Statistics Canada
If the link does not work, the data can be accessed through the official website of the Statistics Canada by following the navigation path outlined below:
Home > Data (in the search bar, search for “Area, production and farm gate value of marketed fruits” or Dataset ID: 32-10-0364-01) > Area, production and farm value of marketed fruits
Step1: click the Add/Remove button
Step2: the pic below shows the layout of column
filter option
Step2.1 Select the Geography:
Click on the Geography tab.
Don’t check the box next to “Canada” as it will include nationwide data for the entire country.
Expand the list by clicking the “+” symbol next to “Canada.” and select the desired provinces: British Columbia (BC).
Step2.2 Choose the Variables:
Navigate to the Estimates tab.
From the list of available variables, select
Marketed production and
Cultivated area, total
Step2.3 Choose the Commodity tab:
Please check the box next to the type of fruit you want.
Important: To view all types of fruits, ensure you enter a sufficiently large number in the “Show XXX Members” tab; otherwise, some fruit types may be hidden.
Step2.4 Click on the Farm area, production, value tab:
Step2.5 Click on the Reference period tab:
Step2.6 Customize the Layout:
Adjust the layout settings as preferred.
Note that these settings will only affect the display of the dataset on the website and will not influence the structure of the data when it is downloaded.
Step 3: Download the dataset
../climate_extreme_RA/R_agricultural/read_data.R
# File paths for Canola, Barley, and Fruit data
file_paths <- c("../data/agri/Fruit_prod.csv",
"../data/agri/Fruit_area.csv")
# Define the columns needed
#unit is ton
needed_columns <- c("REF_DATE", "VALUE", "Commodity","Estimates","UOM")
# Function to read and select necessary columns, and add crop type
read_and_select_fru <- function(file_path) {
read_csv(file_path) %>%
select(all_of(needed_columns))%>%
rename(Crop_Type = Commodity)
}
# Read and combine all datasets
data_statcan_fruit <- map_dfr(file_paths, read_and_select_fru)
# Convert the data types
data_statcan_fruit$REF_DATE <- as.integer(data_statcan_fruit$REF_DATE)
data_statcan_fruit$VALUE <- as.numeric(data_statcan_fruit$VALUE)
#check num of missing
sum(is.na(data_statcan_fruit))
Within the
../climate_extreme_RA/report_agricultural/FAOSTAT.Rmd
In Part 4: New Crop Data, Section 4.2: Fruits Data, we present:
A line plot illustrating all fruit yield data from 2002 to 2024.
A line plot for marketed production (in tons) and total cultivated area (in hectares).
A line plot showing the total cultivated area (in hectares), with the overall mean value applied for years prior to 2002.
We could analyze the yield data from the past 20 years in relation to EHF to identify any potential patterns. This approach may provide insights into the correlation between extreme heat events and yield fluctuations.
However, for a more comprehensive analysis, additional area data is necessary. A
Dataset ID: “Table: 32-10-0365-01 (formerly CANSIM 001-00013)”
Year Range: 1940-2024
Frequency: Annual
Variables:
Average yield per hectare (kilograms)
Average yield per acre (pounds)
Comment: yield data is from 1940-2017
Area planted (acres)
Area planted (hectares)
Area harvested (hectares)
Area harvested (acres)
Marketed production (tons)
Marketed production (metric tonnes)
Total production (tons)
Total production (metric tonnes)
Geography: Canada, Province-wide: British Columbia (BC); Other provinces
To investigate the potential correlation between temperature extremes, including heat waves, and long-term yield patterns in British Columbia, a comprehensive yield dataset is required.
Therefore, it is necessary to estimate the yield data after 2017 to 2024.
Link to data source: StatisticsCanada
If the link does not work, the data can be accessed through the official website of the Statistics Canada by following the navigation path outlined below:
Home > Data (in the search bar, search for “Area, production and farm gate value of marketed vegetables” or Dataset ID: 32-10-0365-01) > Area, production and farm value of marketed vegetables
Step1: click the Add/Remove button
Step2: the pic below shows the layout of column
filter option
Step2.1 Select the Geography:
Click on the Geography tab.
Don’t check the box next to “Canada” as it will include nationwide data for the entire country.
Expand the list by clicking the “+” symbol next to “Canada.” and select the desired provinces: British Columbia (BC).
Step2.2 Choose the Variables:
Navigate to the Estimates tab.
From the list of available variables, select all except
Farm gate value (dollars)
Step2.3 Choose the Commodity tab:
Please check the box next to the type of fruit you want.
Important: To view all types of fruits, ensure you enter a sufficiently large number in the “Show XXX Members” tab; otherwise, some fruit types may be hidden.
Step2.4 Click on the Reference period tab:
Step2.5 Customize the Layout:
Adjust the layout settings as preferred.
Note that these settings will only affect the display of the dataset on the website and will not influence the structure of the data when it is downloaded.
Step 3: Download the dataset
../climate_extreme_RA/R_agricultural/read_data.R# Read the CSV file
file_paths <- c("../data/agri/Vegetable_Data.csv")
# Define the columns needed
needed_columns <- c("REF_DATE", "Estimates","VALUE","Commodity","UOM")
read_and_select <- function(file_path) {
read.csv(file_path) %>%
select(all_of(needed_columns))%>%
rename(Crop_Type = Commodity)
}
# Read and combine all datasets
data_veg <- map_dfr(file_paths, read_and_select)
#na
sum(is.na(data_veg$VALUE))
# Impute missing VALUEs with the mean of neighboring VALUEs
data_veg <- data_veg %>%
group_by(Crop_Type,Estimates) %>%
mutate(VALUE = na.approx(VALUE, rule = 2))
sum(is.na(data_veg$VALUE))
# Clean the Crop_Type column by removing "Fresh" and the numbers
data_veg$Crop_Type <- gsub("Fresh ", "", data_veg$Crop_Type)
data_veg$Crop_Type <- gsub("\\s*\\[.*\\]", "", data_veg$Crop_Type)
#NUM of crop type
length(unique(data_veg$Crop_Type))
#filter some veg out, keep only long term data
data_veg_1940 <- data_veg %>%
filter(Crop_Type != "broccoli" & Crop_Type != "Brussels sprouts" & Crop_Type != "eggplants (except Chinese eggplants)" &
Crop_Type != "French shallots and green onions" & Crop_Type != "garlic" &
Crop_Type != "Other fresh fine herbs" & Crop_Type != "Other fresh melons" &
Crop_Type != "Other fresh vegetables" & Crop_Type != "parsley" &
Crop_Type != "peppers" & Crop_Type != "pumpkins" & Crop_Type != "radishes" &
Crop_Type != "squash and zucchini" & Crop_Type != "sweet potatoes" &
Crop_Type != "Total fresh vegetables" & Crop_Type != "watermelons" &
Crop_Type != "rhubarb" &
Crop_Type != "leeks" & Crop_Type != "tomatoes" & Crop_Type != "cucumbers and fresh gherkins (all varieties)")
length(unique(data_veg_1940$Crop_Type))
veg_year_2<- data_veg_1940 %>%
group_by(Crop_Type,Estimates) %>%
summarize(Start_Year = min(REF_DATE), End_Year = max(REF_DATE))
In the script located at
../climate_extreme_RA/R_agricultural/explore_qualified_crop.R,
we are tasked with calculating and populating the average yield values
for the years 2017 to 2024 (since it is missing from the original yield
data) through the calculate_yields().
This involves handling two types of production data and two types of area data, each in different units.
We aim to calculate the yields for all combinations and subsequently compare these yields with the original yield data from 1940 to 2017.
Based on this comparison, we determine that:Marketed production (tons transfer to kg) and Area harvested (hectares) should be utilized.
Upon selecting the appropriate columns and calculating the yields
for 2017 to 2024, the next step involves analysis the results within
../climate_extreme_RA/report_agricultural/agri_lm_model.Rmd.
In Part 1: Vegetation Data, a comparative analysis
of the calculated yields versus the original historical yields is
presented.
Crops will be classified into five groups based on the quality of the simulated yield.
Good quality will be defined as a perfect match between the calculated yield and the historical yield
While poor quality will be indicated by the presence of outliers or significant discrepancies between the two sets of yield data.
Research is required to determine which types of fruits are highly influenced by heat waves or cold waves. Additionally, data must be adjusted to reflect the full agricultural cycle year for each crop.
For instance, if the 2021 yield data for Crop A corresponds to planting in March 2021 and harvesting in October 2021, the climate data for the year 2021 will be used, as the entire crop cycle falls within the same calendar year.
However, if the 2021 yield data pertains to crops planted in November or December 2020 and harvested in May 2021, in this case, the “year” for climate data should reflect the period from the planting season in late 2020 through the harvesting season in 2021.
Dataset ID: “Table: 32-10-0002-01 (formerly CANSIM 001-0071)”
Year Range: 1990-2024
Frequency: Annual
Variables:
Average yield per hectare (kilograms)
Average yield per acre (pounds)
Geography: Canada, Province-wide: British Columbia (BC); Other provinces
Link to data source: StatisticsCanada
If the link does not work, the data can be accessed through the official website of the Statistics Canada by following the navigation path outlined below:
Home > Data (in the search bar, search for “Estimated areas, yield and production of principal field crops by Small Area Data Regions, in metric and imperial units” or Dataset ID: 32-10-0002-01) > Estimated areas, yield and production of principal field crops by Small Area Data Regions, in metric and imperial units
Step1: click the Add/Remove button
Step2: the pic below shows the layout of column
filter option
Step2.1 Select the Geography:
Click on the Geography tab.
Check the box next to “British Columbi”
Expand the list by clicking the “+” symbol next to “Canada.” and select the rest region.
we picked the station FortStJohn: 1910-2024 from the Peach River region for analysis.
Step2.2 Choose the Variables:
Navigate to the Harvest disposition tab.
From the list of available variables, select Averave yield.
Step2.3 Choose the Type of crop tab:
Please check the box next to Oat,
Barley, Peas,dry,
Rye, fall remaining, Wheat, spring and
Canola as other crop is missing for BC.
Important: To view all types of fruits, ensure you enter a sufficiently large number in the “Show XXX Members” tab; otherwise, some fruit types may be hidden.
Step2.4 Click on the Reference period tab:
Step2.5 Customize the Layout:
Adjust the layout settings as preferred.
Note that these settings will only affect the display of the dataset on the website and will not influence the structure of the data when it is downloaded.
Step 3: Download the dataset
../climate_extreme_RA/R_agricultural/read_data.R# Read the CSV file
file_paths <- c("../data/agri/FieldCropsEstimates.csv")
# Define the columns needed
needed_columns <- c("REF_DATE", "Harvest.disposition","VALUE",'GEO',"Type.of.crop","UOM")
# Function to read and select necessary columns, and add crop type
read_and_select <- function(file_path) {
read.csv(file_path) %>%
select(all_of(needed_columns)) %>%
#rename(Production = VALUE) %>%
rename(Crop_Type = Type.of.crop)
}
# Read and combine all datasets
data_statcan_crop <- map_dfr(file_paths, read_and_select)
#filter Harvest.disposition to be Average yield (kilograms per hectare)
crop_yield <- data_statcan_crop %>% filter(Harvest.disposition == "Average yield (kilograms per hectare)")
total_produ <- data_statcan_crop %>% filter(Harvest.disposition == "Production (metric tonnes)")
#rename
crop_yield <- crop_yield %>% rename(yield = VALUE)
total_produ <- total_produ %>% rename(Production = VALUE)
# Impute missing values with the mean of neighboring values
crop_yield <- crop_yield %>%
group_by(Crop_Type,GEO) %>%
mutate(yield = na.approx(yield, rule = 2))
total_produ <- total_produ %>%
group_by(Crop_Type,GEO) %>%
mutate(Production = na.approx(Production, rule = 2))